A heuristically accelerated reinforcement learning method for maintenance policy of an assembly line
نویسندگان
چکیده
This paper aims to investigate the maintenance policy for a two-machine one-buffer (2M1B) assembly line system. We assume that observed quality states of deteriorating machines in system are characterized by multiple decreasing yield stages. A semi-Markov decision process (SMDP) model is used describing heuristically accelerated multi-agent reinforcement learning (HAMRL) method conducted solve problem model. The asynchronous updating rules introduced HAMRL method, and production time, preventive (PM) time corrective repair (CR) random, deterioration mode device not fixed. Meanwhile, comparison with simulated annealing search (SAS) based exploration algorithm neighborhood (NS) (RL) presented. empirical results indicate proposed can speed up process, has certain advantage larger space more practical problem. And strategy 2M1B obtained under condition convergent average cost rate. provides new insights into application selection techniques
منابع مشابه
Heuristically-Accelerated Reinforcement Learning: A Comparative Analysis of Performance
This paper presents a comparative analysis of three Reinforcement Learning algorithms (Q-learning, Q(λ)-learning and QSlearning) and their heuristically-accelerated variants (HAQL, HAQ(λ) and HAQS) where heuristics bias action selection, thus speeding up the learning. The experiments were performed in a simulated robot soccer environment which reproduces the conditions of a real competition lea...
متن کاملMarket-Based Dynamic Task Allocation Using Heuristically Accelerated Reinforcement Learning
This paper presents a Multi-Robot Task Allocation (MRTA) system, implemented on a RoboCup Small Size League team, where robots participate of auctions for the available roles, such as attacker or defender, and use Heuristically Accelerated Reinforcement Learning to evaluate their aptitude to perform these roles, given the situation of the team, in real-time. The performance of the task allocati...
متن کاملHeuristically Accelerated Reinforcement Learning: Theoretical and Experimental Results
Since finding control policies using Reinforcement Learning (RL) can be very time consuming, in recent years several authors have investigated how to speed up RL algorithms by making improved action selections based on heuristics. In this work we present new theoretical results – convergence and a superior limit for value estimation errors – for the class that encompasses all heuristicsbased al...
متن کاملHeuristically Accelerated Q-Learning: A New Approach to Speed Up Reinforcement Learning
This work presents a new algorithm, called Heuristically Accelerated Q–Learning (HAQL), that allows the use of heuristics to speed up the well-known Reinforcement Learning algorithm Q–learning. A heuristic functionH that influences the choice of the actions characterizes the HAQL algorithm. The heuristic function is strongly associated with the policy: it indicates that an action must be taken ...
متن کاملAccelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning
Constrained Markov Decision Process (CMDP) is a natural framework for reinforcement learning tasks with safety constraints, where agents learn a policy that maximizes the long-term reward while satisfying the constraints on the long-term cost. A canonical approach for solving CMDPs is the primal-dual method which updates parameters in primal and dual spaces in turn. Existing methods for CMDPs o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Industrial and Management Optimization
سال: 2023
ISSN: ['1547-5816', '1553-166X']
DOI: https://doi.org/10.3934/jimo.2022047